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Abstract 

Three object-oriented implementations of a prototype solver of the advection equation are introduced. The presented programs 
are based on Blitz++ (C++), NumPy (Python), and Fortran's built-in array containers. The solvers include an implementation 
of the Multidimensional Positive-Definite Advective Transport Algorithm (MPDATA). The introduced codes exemplify how the 
application of object-oriented programming (OOP) techniques allows to reproduce the mathematical notation used in the literature 
within the program code. A discussion on the tradeoffs of the programming language choice is presented. The main angles 
of comparison are code brevity and syntax clarity (and hence maintainability and auditability) as well as performance. In the 
case of Python, a significant performance gain is observed when switching from the standard interpreter (CPython) to the PyPy 
implementation of Python. Entire source code of all three implementations is embedded in the text and is licensed under the terms 
of the GNU GPL license. 

^Keywords: object-oriented programming, advection equation, MPDATA, C++, Fortran, Python 
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1. Introduction 

Object oriented programming (OOP) "has become recog- 
nised as the almost unique successful paradigm for creating 
complex software" [1, Sec. 1.3]. It is intriguing that, while the 
quoted statement comes from the very book subtitled The Art of 
Scientific Computing, hardly any (if not none) of the currently 
operational weather and climate prediction systems - flagship 
examples of complex scientific software - make extensive use 
of OOP techniques. Fortran has been the language of choice in 
oceanic J2l, weather-prediction and Earth system [4] mod- 
elling, and none of its 20-century editions were object-oriented 
languages [see e.g.[l, for discussion]. 

Application of OOP techniques in development of numerical 
modelling software may help to: 

(i) maintain modularity and separation of program logic lay- 
ers (e.g. separation of numerical algorithms, parallelisation 
mechanisms, data input/output, error handling and the de- 
scription of physical processes); and 

(ii) shorten and simplify the source code and improve its 
readability by reproducing within the program logic the 
mathematical notation used in the literature. 

The first application is attainable, yet arguably cumbersome, 
with procedural programming. The latter, virtually impossible 
to obtain with procedural programming, is the focus of this pa- 
per. It also enables the compiler or library authors to relieve 
the user (i.e. scientific programmer) from hand-coding optimi- 
sations, a practice long recognised as having a strong negative 
impact when debugging and maintenance are considered fl. 
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MPDATA 0] stands for Multidimensional Positive Definite 
Advective Transport Algorithm and is an example of a numer- 
ical procedure used in weather, climate and ocean simulation 
systems [e.g.[H|9|,[lol respectively]. MPDATA is a solver for 
systems of advection equations of the following form: 



d,iff = -V ■ (vt/r) 



(1) 



that describe evolution of a scalar field iff transported by the 
fluid flow with velocity v. Quoting Numerical Recipes once 
more, development of methods to numerically solve such prob- 
lems "is an art as much as a science" |l|, Sec. 20.1], and MP- 
DATA is an example of the state-of-the art in this field. MP- 
DATA is designed to accurately solve equation (Q3 in an ar- 
bitrary number of dimensions assuring positive-definiteness of 
scalar field iff and incurring small numerical diffusion. All rele- 
vant MPDATA formulae are given in the text but are presented 
without derivation or detailed discussion. For a recent review 



of MPDATA-based techniques see Smolarkiewicz Hi lL and ref- 
erences therein]. 

In this paper we introduce and discuss object-oriented imple- 
mentations of an MPDATA-based two-d imension al (2D) advec- 
tion equation solver written in C ++ 11 JlSO/IECl 14882:2011), 
Python ill and Fortran 2008 dlSO/IEd 1539-1:20101. In 
the following section we introduce the three implementations 
briefly describing the algorithm itself and discussing where and 
how the OOP techniques may be applied in its implementa- 
tion. The syntax and nomenclature of OOP techniques are used 
without introduction, for an overview of OOP in context of 
C++, Python and Fortran, consult for example [15, Part II], 
HH Chapter 5] and Chapter 11], respectively. The third 
section of this paper covers performance evaluation of the three 
implementations. The fourth section covers discussion of the 
tradeoffs of the programming language choice. The fifth sec- 
tion closes the article with a brief summary. 

Throughout the paper we present the three implementations 
by discussing source code listings which cover the entire pro- 
gram code. Subsections 12. Hl2~6l describe all three implemen- 
tations, while subsequent sections [2/7H2. 121 cover discussion of 
C++ code only. The relevant parts of Python and Fortran codes 
do not differ significantly, and for readability reasons are pre- 
sented in |Appendix P| and |Appendix F| respectively. 

The entire code is licensed under the terms of the GNU Gen- 
eral Public License license version 3 II 1811 . 

All listings include line numbers printed to the left of the 
source code, with separate numbering for C++ (listings pre- 
fixed with C, black frame), 

listing CO (C++) 

// code licensed under the terms of GNU GPL v3 
// copyright holder: University of Warsaw 



Python (listings prefixed with P, blue frame) and 



listing P.O (Python) 



# code licensed under the terms of GNU GPL v3 

# copyright holder: University of Warsaw 



Fortran (listings prefixed with F, red frame). 

listing F.O (Fortran) 



code licensed under the terms of GNU GPL v3 
copyright holder: University of Warsaw 



Programming language constructs when inlined in the text are 
typeset in bold, e.g. GOTO 2. 



2. Implementation 

Double precision floating-point format is used in all three im- 
plementations. The codes begin with the following definitions: 

listing C.l (C++) 



3 typedef double real_t; 



listing P.l (Python) 



3 real_t = ' f loat64' 



listing F.l (Fortran) 



module real_m 
implicit none 

integer, parameter : : real_t = kind(O.dO) 
end module 



which provide a convenient way of switching to different preci- 
sion. 

All codes are structured in a way allowing compilation of the 
code in exactly the same order as presented in the text within 
one source file, hence every Fortran listing contains definition 
of a separate module. 

2.1. Array containers 

Solution of equation ([TJ using MPDATA implies discretisa- 
tion onto a grid of the iff and the Courant number C = v • ^ 
fields, where At is the solver timestep and Ax is the grid spac- 
ing. 

Presented C++ implementation of MPDATA is built upon 
the Blitz++ libraryQ. Blitz offers object-oriented representa- 
tion of n-dimensional arrays, and array-valued mathematical 
expressions. In particular, it offers loop-free notation for array 
arithmetics that does not incur creation of intermediate tempo- 
rary objects. Blitz++ is a header-only library^ - to use it, it 
is enough to include the appropriate header file, and optionally 
expose the required classes to the present namespace: 

listing C.2 (C++) 

^include <blitz /array . h> 
using arr_t = blitz : : Array<real_t, 2>; 
using rng_t = blit z :: Range; 
using idx_t = blit z : : RectDomain<2> ; 



Here arr_t, rng_t and idx_t serve as alias identifiers and are 
introduced in order to shorten the code. 

The power of Blitz++ comes from the ability to express ar- 
ray expressions as objects. In particular, it is possible to de- 
fine a function that returns an array expression; i.e. not the 
resultant array, but an object representing a „recipe" defining 
the operations to be performed on the arguments. As a conse- 
quence, the return types of such functions become unintelligi- 
ble. Luckily, the auto return type declaration from the C++1 1 
standard allows to simplify the code significantly, even more if 
used through the following preprocessor macro: 



'Blitz++ is a C++ class library for scientific computing which uses 
the expression templates technique to achieve high performance, see 
http : //sf . net/proj ects/blitz/ 

z Blitz++ requires linking with libblitz if debugging mode is used 
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listing C.3 (C++) 

^define return_macro (expr) \ 

-> decltype (safeToReturn (expr) ) \ 
{ return safeToReturn (expr) ; } 



The call to blitz ::safeToReturn() function is included in or- 
der to ensure that all arrays involved in the expression being 
returned continue to exist in the caller scope. For example, def- 
inition of a function returning its array-valued argument dou- 
bled, reads: auto f(arr_t x) return_macro(2*x). This is the 
only preprocessor macro defined herein. 

For the Python implementation of MPDATA the NumP>0 
package is used. In order to make the code compatible with 
both the standard CPython as well as the alternative PyPy im- 
plementation of Python 1 19], the Python code includes the fol- 
lowing sequence of import statements: 

listing P. 2 (Python) 



try: 

import numpypy 

from _numpypy . pypy import set_invalidation 

set_invalidation (False) 
except ImportError: 

pass 
import numpy 
try: 

numpy .seterr(all=' ignore') 
except AttributeError: 
pass 



First, the PyPy's built-in NumPy implementation named 
numpypy is imported if applicable (i.e. if running PyPy), 
and the lazy evaluation mode is turned on through the 
set_invalidation(False) call. PyPy's lazy evaluation obtained 
with the help of a just-in-time compiler enables to achieve an 
analogous to Blitz++ temporary-array-free handling of array- 
valued expressions (see discussion in section [3}. Second, to 
match the settings of C++ and Fortran compilers used herein, 
the NumPy package is instructed to ignore any floating-point 
errors, if such an option is available in the interpreteiQ The 
above lines conclude all code modifications that needed to be 
added in order to run the code with PyPy. 

Among the three considered languages only Fortran is 
equipped with built-in array handling facilities of practical use 
in high-performance computing. Therefore, there is no need 
for using an external package as with C++ and Python. Fortran 
array-handling features are not object-oriented, though. 

2.2. Containers for sequences of arrays 

As discussed above, discretisation in space of the scalar field 
i//(x, y) into its t/q, j] grid representation requires floating-point 
array containers. In turn, discretisation in time requires a con- 
tainer class for storing sequences of such arrays, i.e. {^"\ 
^.[n+i] j Similarly the components of the vector field C are in 
fact a {C w , 0*' } array sequence. 

Using an additional array dimension to represent the se- 
quence elements is not considered for two reasons. First, the 



and arrays constituting the sequence have different 
sizes (see discussion of the Arakawa-C grid in section 12.3) . 
Second, the order of dimensions would need to be different for 
different languages to assure that the contiguous dimension is 
used for one of the space dimensions and not for time levels. 

In the C++ implementation the Boosjf] ptr_vector class 
is used to represent sequences of Blitz ++ arrays and at the 
same time to handle automatic freeing of dynamically allo- 
cated memory. The ptr_vector class is further customised by 
defining a derived structure which element-access [ ] operator 
is overloaded with a modulo variant: 

listing C.4 (C++) 



^Include <boost/ptr_container/ptr_vector.hpp> 
struct arrvec_t : boost :: ptr_vector<arr_t> 
{ 

const arr_t &operator[] (const int i) const 



{ 



return this->at((i + this->size ( ) ) % this->size ( ) ) 



Consequently the last element of any such sequence may be 
accessed at index -1, the last but one at -2, and so on. 

In the Python implementation the built-in tuple type is used 
to store sequences of NumPy arrays. Employment of negative 
indices for handling from-the-end addressing of elements is a 
built-in feature of all sequence containers in Python. 

Fortran does not feature any built-in sequence container ca- 
pable of storing arrays, hence a custom arrvec_t type is intro- 
duced: 

listing F.2 (Fortran) 



module arrvec_m 
use real_m 
implicit none 

type : : arr_t 

real(real_t) , 
end type 



allocatable 



type : : arrptr_t 

class (arr_t ) , pointer : : p 
end type 

type : : arrvec_t 

class (arr_t ) , allocatable :: arrs ( : ) 
class (arrptr_t ) , allocatable :: at ( : ) 
integer : : length 
contains 

procedure : : ctor => arrvec_ctor 
procedure : : init => arrvec_init 
end type 

contains 

subroutine arrvec_ctor (this, n) 
class (arrvec_t ) :: this 
integer, intent(in) : : n 

this%length = n 
allocate(this%at ( -n : n-1 ) ) 
allocate(this%arrs { : n-1 ) ) 
end subroutine 

subroutine arrvec_init (this , n, i, j) 
class (arrvec_t ) , target :: this 
integer, intent (in) : : n 
integer, intent (in) :: i(2), j{2) 



3 NumPy is a Python package for scientific computing offering support 
for multi-dimensional arrays and a library of numerical algorithms, see 
http : //numpy . org/ 

4 numpy.seterr() is not supported in PyPy as of version 1 .9 



5 Boost is a free and open-source collection of peer-reviewed C++ libraries 
available at http : //www .boost . org/ Several parts of Boost have been inte- 
grated into or inspired new additions to the C++ standard. 
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allocate(this%arrs (n) %a { i (1) : i(2), j (1) 
this%at (n) %p => this%arrs (n) 
this%at {n - this%length) %p => this%arrs (n) 
end subroutine 
end module 



j (2) ) ) 



df.u v C^'._ in and Cy S . lyl to depict the grid values of the C 



The arr_t type is defined solely for the purpose of overcom- 
ing the limitation of lack of an array-of-arrays construct, and 
its only member field is a two-dimensional array. An array of 
arr_t is used hereinafter as a container for sequences of arrays. 

The arrptr_t type is defined solely for the purpose of over- 
coming Fortran's limitation of not supporting allocatables of 
pointers. arrptr_t's single member field is a pointer to an in- 
stance of arr_t. Creating an allocatable of arrptr_t, instead 
of a multi-element pointer of arr_t, ensures automatic memory 
deallocation. 

Type arrptr_t is used to implement the from-the-end ad- 
dressing of elements in arrvec_t. The array data is stored in 
the arrs member field (of type arr_t). The at member field (of 
type arrptr_t) stores pointers to the elements of arrs. at has 
double the length of arrs and is initialised in a cyclic manner 
so that the -1 element of at points to the last element of arrs, 
and so on. Assuming psi is an instance of arrptr_t, the (i j) 
element of the n-th array in psi may be accessed with 
psi%at( n)%p%a( i, j ). 

The ctor(n) method initialises the container for a given num- 
ber of elements n. The init(n,i j) method initialises the n-th ele- 
ment of the container with a newly allocated 2D array spanning 
indices i(l):i(2), and j(l):j(2) in the first, and last dimensions 
respectively^ 

2.3. Staggered grid 
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Figure 1 : A schematic of the Arakawa-C grid. 



The so-called Arakawa-C staggered grid 112011 depicted in 
Figure Q] is a natural choice for MPDATA. As a consequence, 
the discretised representations of the iff scalar field, and each 
component of the C — v ■ vector field in eq. (Q} are defined 
over different grid point locations. In mathematical notation 
this can be indicated by usage of fractional indices, e.g. \^ ^, 



In Fortran, when an array is passed as a function argument its base is locally 
set to unity, regardless of the setting at the caller scope. 



'[i+'AjT ^ILj-'M 



'[>,J+'A] 



vector components surrounding iff[i,j]. However, fractional in- 
dexing does not have a built-in counterpart in any of the em- 
ployed programming languages. A desired syntax would trans- 
late i - l /i to i — 1 and i + l /i to i. OOP offers a convenient way to 
implement such notation by overloading the + and - operators 
for objects representing array indices. 

In the C++ implementation first a global instance h of an 
empty structure hlf_t is defined, and then the plus and minus 
operators for hlf_t and rng_t are overloaded: 

listing C.5 (C++) 



struct hlf_t { } h; 

inline rng_t operators (const rng_t Si, const hlf_t &) 
t 

return i; 

J 

inline rng_t operator- (const rng_t Si, const hlf_t &) 
{ 

return i-1; 

J 



This way, the arrays representing vector field components can 
be indexed using (i+hj), (i-h,j) etc. where h represents the 
half. 

In NumPy in order to prevent copying of array data during 
slicing one needs to operate on the so-called array views. Array 
views are obtained when indexing the arrays with objects of the 
Python's built-it slice type (or tuples of such objects in case of 
multi-dimensional arrays). Python forbids overloading of oper- 
ators of built-in types such as slices, and does not define addi- 
tion/subtraction operators for slice and int pairs. Consequently, 
a custom logic has to be defined not only for fractional index- 
ing, but also for shifting the slices by integer intervals (i + 1). It 
is implemented here by declaring a shift class with the adequate 
operator overloads: 

listing P. 3 (Python) 



class shift ( ) : 

def init (self, plus, ir 

self. plus = plus 
self.mnus = mnus 

def radd (self, arg) : 

return type (arg) ( 

arg. start + self. plus, 
arg. stop + self. plus 

) 

def rsub (self, arg) : 

return type (arg) ( 

arg. start - self.mnus, 
arg. stop - self.mnus 



and two instances of it to represent unity and half in expressions 
like i+one, i+hlf, where i is an instance of slice 0: 



listing P. 4 (Python) 



one = shift (1, 1) 
hlf = shift (0, 1) 



In Fortran fractional array indexing is obtained through def- 
inition and instantiation of an object representing the half, and 
having appropriate operator overloads: 



7 One could argue that not using an own implementation of a slice- 
representing class in NumPy is a design flaw - being able to modify behaviour 
of a hypothetical numpy. slice class through inheritance would allow to imple- 
ment the same behaviour as obtained in listing P.3 without the need to represent 
the unity as a separate object 
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module arakawa_c_m 
implicit none 

type : : half_t 
end type 

type(half_t) : : h 

interface operator (+) 
module procedure ph 
end interface 

interface operator (-) 
module procedure mh 
end interface 

contains 

elemental function ph (i, h 
integer, intent(in) : : i 
type (half_t ) , intent(in) 
integer : return 
return = i 

end function 

elemental function mh (i, h 
integer, intent(in) : : i 
type(half_t) , intent(in) 
integer : return 
return = i - 1 

end function 
end module 



listing F.3 (Fortran) 



result (return) 



result (return) 



2.4. Halo regions 

The MPDATA formula; defining as a function of ^ 
(discussed in the following sections) feature terms such as 
ftu-ij-i]- One way of assuring validity of these formulae on the 
edges of the domain (e.g. for i=0) is to introduce the so-called 
halo region surrounding the domain. The method of populating 
the halo region with data depends on the boundary condition 
type. Employment of the halo-region logic implies repeated us- 
age of array range extensions in the code such as i ^> i ± halo. 

An ext() function is defined in all three implementation, in 
order to simplify coding of array range extensions: 

listing C.6 (C++) , 



template<class n_t> 
inline rng_t ext (const rng_t &r, const n_t &n) 
return rng_t { 

(r - n) . first () , 

(r + n) . last ( ) 



) ; 



def ext (r, n) : 

if {type (n) == 
n = one 

return slice ( 
(r - n) . start 
(r + n) . stop 

) 



. listing P 
int) & (n 



(Python) 
1) : 



listing F.4 (Fortran) 



module halo_m 

use arakawa_c_m 
implicit none 

interface ext 

module procedure ext_n 

module procedure ext_h 
end interface 

contains 

function ext_n{r, n) result (return) 
integer, intent (in) : : r{2) 
integer, intent (in) : : n 



integer : : return{2) 

return = {/ r(l) - n, r{2) + n /) 
end function 

function ext_h(r, h) result (return) 
integer, intent(in) : : r(2) 
type{half_t) , intent(in) : : h 
integer : : return{2) 

return = {/ r(l) - h, r{2) + h /) 
end function 
end module 



Consequently, a range depicted by i ± 1/2 may be expressed 
in the code as ext(i, h). In all three implementations the ext() 
function accept the second argument to be an integer or a "half 
(cf. section |23I >. 



2.5. Array index permutations 

Hereinafter, the n d ab symbol is used to denote a cyclic permu- 
tation of an order d of a set {a, b). It is used to generalise the 
MPDATA formulae into multiple dimensions using the follow- 
ing notation: 



d=0 

Blitz++ ships with the RectDomain class (aliased here as 
idx_t) for specifying array ranges in multiple dimensions. The 
n permutation is implemented in C++ as a function pi() return- 
ing an instance of idx_t. In order to ensure compile-time eval- 
uation, the permutation order is passed via the template param- 
eter d (note the different order of i and j arguments in the two 
template specialisations): 

listing C.7 (C++) 



template<int d> 

inline idx_t pi (const rng_t &i, const rng_t &j) ; 

tempi at e<> 

inline idx_t pi<0> (const rng_t &i, const rng_t &j) 
{ 

return idx_t ( { i, j } ) ; 



templateo 

inline idx_t pi<l> (const rng_t &j, const rng_t &i) 
{ 

return idx_t ( { i, j } ) ; 



NumPy uses tuples of slices for addressing multi- 
dimensional array with a single object. Therefore, the following 
definition of function pi() suffices to represent tt: 



def pi (d, *idx) : 
return ( idx [ d] , 



listing P. 6 (Python) 
idx [d- 1] ) 



In the Fortran implementation pi() returns a pointer to the 
array elements specified by i and j interpreted as (i,j) or (j,i) 
depending on the value of the argument d. In addition to pi(), a 
helper span() function returning the length of one of the vectors 
passed as argument is defined: 

_ listing F.5 (Fortran) 



module pi_m 
use real_m 
implicit none 
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j) result {return) 



contains 

function pi (d, arr, 

integer, intent (in) : : d 

real ( real_t ) , allocatable, target :: 

real(real_t) , pointer : : return! : , : ) 



arr { : 



integer, intent {in} 
select case (d) 
case (0) 

return => arr { i{l) 
case (1) 

return => arr { j{l) 
end select 
end function 

pure function span (d, 
integer, intent (in) 
integer, intent {in) : : d 
integer : return 
select case (d) 
case (0) 

return = i (2) - i (1) 
case (1) 

return = j (2) - j (1) 
end select 
end function 
end module 



i(2), j(2) 

: i(2), j(l) 
: j (2) , i (1) 



(2) ) 
(2) ) 



j) result (return) 

i(2), j(2) 



The span() function is used to shorten the declarations of ar- 
rays to be returned from functions in the Fortran implementa- 
tion (see listings F. 1 1 and F. 17-F.20). 

It is worth noting here that the C++ implementation of pi() 
is branchless thanks to employment of template specialisation. 
With Fortran one needs to rely on compiler optimisations to 
eliminate the conditional expression within the pi() that de- 
pends on value of d which is always known at compile time. 

2.6. Prototype solver 

The tasks to be handled by a prototype advection equation 
solver proposed herein are: 

(i) storing arrays representing the tf/ and C fields and any re- 
quired housekeeping data, 

(ii) allocating and deallocating the required memory, 

(iii) providing access to the solver state, 

(iv) performing the integration by invoking the advection- 
operator and boundary-condition handling routines. 

In the following C++ definition of the solver structure, task 
(i) is represented with the definition of the structure member 
fields; task (ii) is split between the solver's constructor and the 
destructors of arrvec_t; task (iii) is handled by the accessor 
methods; task (iv) is handled within the solve method: 

listing C.8 (C++) 

template<class bcx_t, class bcy_t> 
struct solver 



// member fields 
arrvec_t psi, C; 
int n, hlo; 
rng_t i, j; 
bcx_t bcx; 
bcy_t bey; 

// ctor 

solver (int nx, int ny, int hlo) 
hlo (hlo) , 
n(0) , 

i(0, nx-1), 
j(0, ny-1), 



bcx ( i, j , hlo) , 
bcy(j, i, hlo) 

for (int 1 = 0; 1 < 2; ++1) 

psi . push_back (new arr_t(ext(i, hlo), ext(j, hlo))) 
C . push_back (new arr_t(ext(i, h) , ext(j, hlo))); 
C . push_back (new arr_t (ext (i, hlo), ext(j, h) ) ) ; 



// accessor methods 
arr_t state ( ) { 

return psi[n] (i,j) .reindex({0,0}) 



arr_t courant (int d) 
{ 

return C [d] ; 



// helper methods Invoked by solve () 
virtual void advop ( ) = 0; 



void cycle ( } 
{ 



n = (n + 1) % 2 



2; 



// integration logic 
void solve (const int nt) 



for 

{ 



(int t 



0; t < nt; ++t) 



bcx . f ill_halos (psi [n ] , 
bey. fill_halos (psi [n] , 
advop ( ) ; 
cycle ( ) ; 



ext ( j, 
ext (i, 



hlo) ) ; 
hlo) ) ; 



The solver structure is an abstract definition (containing a pure 
virtual method) requiring its descendants to implement at least 
the advopO method which is expected to fill psi[n+l] with an 
updated (advected) values of psi[n]. The two template parame- 
ters bcx_t and bcy_t allow the solver to operate with any kind 
of boundary condition structures that fulfil the requirements im- 
plied by the calls to the methods of bcx and bey, respectively. 

The donor-cell and MPDATA schemes both require only the 
previous state of an advected field in order to advance the so- 
lution. Consequently, memory for two time levels (iffi 1 ' and 
^[«+i]) j s allocated in the constructor. The sizes of the arrays 
representing the two time levels of if/ are defined by the domain 
size (nx x ny) plus the halo region. The size of the halo region 
is an argument of the constructor. The cycle() method is used 
to swap the time levels without copying any data. 

The arrays representing the O-*' and C"' components of C, 
require (nx+1) x ny and nx x (ny+1) elements, respectively (be- 
ing laid out on the Arakawa-C staggered grid). 

Python definition of the solver class follows closely the C++ 
structure definition: 

listing P. 1 (Python) 



class solver(ob ject) : 
# ctor-like method 

def init (self, bcx 

self.n = 
self. hlo = hlo 
self.i = slice (hlo, 
self.j = slice(hlo, 



bey, nx, ny, hlo) : 

nx + hlo) 
ny + hlo) 



self .bcx 
self .bey 

self .psi 



bcx (0, 
bey (1, 

( 



self . i, 
self . j , 



hlo) 
hlo) 



numpy . empty ( ( 
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ext. (self . i, self, hi o) 

ext (self, j, self . hlo ) 

), real_t), 

numpy . empty ( ( 

ext (self. i, self. hlo) 

ext (self. j, self. hlo) 

), real_t) 



stop, 
stop 



stop, 
stop 



slf.C = ( 

numpy . empty ( ( 
ext ( self . i , 
ext ( self . j , 

), real_t), 

numpy . empty ( ( 
ext ( self . i , 
ext ( self . j , 

), real_t) 



hlf ) . stop, 
self . hlo ) . stop 



self . hlo ) . stop, 
hlf) . stop 



) 



# accessor methods 
def state (self) : 

return self . psi [ self . n ] [self.i, self.j] 

# helper methods invoked by solve () 
def courant (self , d) : 

return self .C[d] [ : ] 



def cycle (self) : 
self.n = (self.n 



1) 



# integration logic 
def solve (self, nt } : 
for t in range (nt) : 
self . bcx . f ill_halos ( 

self. psi [self.n] , ext (self. j, self. hlo) 

) 

self . bey . f ill_halos ( 

self. psi [self.n] , ext (self.i, self. hlo) 

) 

self . advop ( ) 
self . cycle ( ) 



The key difference stems from the fact that, unlike Blitz++, 
NumPy does not allow an array to have arbitrary index base - 
in NumPy the first element is always addressed with 0. Conse- 
quently, while in C++ (and Fortran) the computational domain 
is chosen to start at (i=0, j=0) and hence a part of the halo re- 
gion to have negative indices, in Python the halo region starts 
at (0,0fl However, since the whole halo logic is hidden within 
the solver, such details are not exposed to the user. The bcx and 
bey boundary-condition specifications are passed to the solver 
through constructor-like init () method as opposed to tem- 
plate parameters in C++. 

The above C++ and Python prototype solvers in principle 
allow to operate with any boundary condition objects that im- 
plement methods called from within the solver. This require- 
ment is checked at compile-time in the case of C++, and at 
run-time in the case of Python. In order to obtain an analo- 
gous behaviour with Fortran, it is required to define, prior to 
definition of a solver type, an abstract type with deferred pro- 
cedures having abstract interfaces [sic!, see Table 2.1 in ml for 
a summary of approximate correspondence of OOP nomencla- 
ture between Fortran and C++]: 

listing F.6 (Fortran) 



module bcd_m 
use arrvec m 



The reason to allow the domain to begin at an arbitrary index is mainly to 
ease debugging in case the code would be used in parallel computations using 
domain decomposition where each subdomain could have its own index base 
corresponding to the location within the computational domain 



implicit none 

type, abstract : : bcd_t 
contains 

procedure (bcd_f ill_halos) 
procedure (bcd_init) , 
end type 



deferred : 
deferred : : init 



f ill_halos 



abstract interface 

subroutine bcd_f ill_halos (this, 
impo r t : : b c d_t , r e a l_t 
class(bcd_t ) : : this 
real{real_t) , allocatable : : 
integer : : j (2 ) 

end subroutine 

subroutine bcd_init {this, d, n, 
import : : bcd_t 



hlo) 



class(bcd_t) 
integer : : d, 
end subroutine 
end interface 
and module 



this 
, hlo 



Having defined the abstract type for boundary-condition ob- 
jects, a definition of a solver class following closely the C++ 
and Python counterparts may be provided: 

listing F.7 (Fortran) 



module solver_m 
use arrvec_m 
use bcd_m 
use arakawa_c_m 
use halo_m 
implicit none 

type, abstract : : solver_t 

class (arrvec_t ) , allocatable :: psi, 

integer : : n, hlo 

integer :: i(2), j(2) 

class (bcd_t) , pointer : : bcx, bey 

contains 

procedure : : solve => solver_solve 
procedure : : state 
procedure : : courant 
procedure : : cycle 



procedure (solver_advop) 
end type 



solver_state 
solver_courant 
solver_cycle 
deferred : : advop 



abstract interface 

subroutine solver_advop (this ) 
import solver_t 

class ( solver_t ) , target : : this 
end subroutine 
end interface 



bcx, bey, nx, ny , hlo) 



target 

ny, hlo 



bcx, bey 



contains 

subroutine solver_ctor (this, 
use arakawa_c_m 
use halo_m 

class ( solver_t ) : : this 
class(bcd_t) , intent(in) , 
integer, intent(in) : : nx, 

this%n - 

this%hlo = hlo 

this%bcx => bcx 

this%bcy => bey 

this%i - (/ 0, nx - 1 /) 
this%j - (/ 0, ny - 1 /) 

call bcx%init(0, nx, hlo) 
call bcy%init(l, ny, hlo) 



allocate{this%psi) 

call this%psi%ctor (2) 
block 

integer : : n 

do n=0, 1 

call this%psi%init ( 

n, ext (this%i, hlo) , ext (this%j, hlo) 
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25-1 
255 
256 
257 
258 
259 
260 



end do 
end block 

allocate(this%C) 

call this%C%ctor (2) 

call this%C%init{ & 

0, ext (this%i, h) , ext (this%j, hlo) & 

) 

call this%C%init{ & 

1, ext{this%i, hlo), ext(this%j, h) & 

) 

end subroutine 

function solver_state (this) result (return) 
class { solver_t ) : : this 
real ( real_t ) , pointer : : return{ : , : ) 

return => this%psi%at (this%n) %p%a { & 
this%i(l) : this%i(2), & 
this% j (1) : this% j (2) & 

) 

end function 

function solver_courant (this, d) result (return) 
class { solver_t ) : : this 
integer : : d 

real ( real_t ) , pointer : : return( : , : ) 
return => this%C%at (d) %p%a 
end function 

subroutine solver_cycle (this ) 

class { solver_t ) : : this 

this%n = mod(this%n +1+2, 2) - 2 
end subroutine 

subroutine solver_solve (this, nt ) 
class { solver_t ) : : this 
integer, intent (in) : : nt 
integer : : t 

do t = 0, nt-1 

call this%bcx%f ill_halos { & 
this%psi%at (this%n) %p%a, ext (this%j, this%hlo) & 

) 

call this%bcy%f ill_halos { & 
this%psi%at (this%n) %p%a, ext (this%i, this%hlo) & 

) 

call this%advop() 
call this%cycle() 
end do 
end subroutine 
end module 



2. 7. Periodic boundaries ( C++) 

From this point, only C++ implementation is explained in 
the main text. The Python and Fortran implementations are in- 
cluded in appendices P and F 

The solver definition described in section lZ6l requires a given 
boundary condition object to implement a fill_halos() method. 
An implementation of periodic boundary conditions in C++ is 
provided in the following listing: 

listing C.9 (C++) 



template<int d> 
struct cyclic 
{ 

// member fields 

rng_t left_halo, rght_halo; 

rng_t left_edge, rght_edge; 



// ctor 
cyclic ( 

const rng_t &i, 
) : 



const rng_t &j, int hlo 



left_halo (i . first ( ) hlo, i . first ()- 1 ) , 
rght_edge (i . last () -hlo+1, i.lastl) ), 
rght_halo (i . last () +1, i.last()+hlo ), 
left_edge (i. first () , i . first () +hlo-l) 



// method invoked by the solver 



void fill_halos (const arr_t &a, const rng_t &j) 
{ 

a (pi<d> (left_halo, j)} = a (pi<d> (rght_edge, j)) 
a (pi<d> (rght_halo, j)} = a (pi<d> (lef t_edge, j)) 



As hinted by the member field names, the flll_halos() meth- 
ods fill the left/right halo regions with data from the right/left 
edges of the domain. Thanks to employment of the function 
pi() described in section |231 the same code may be applied in 
any dimension (here being a template parameter). 

Listings P. 8 and F.8 contain the Python and Fortran counter- 
parts to listing C.9. 

2.8. Dono r- cell formulce (C++) 

MPDATA is an iterative algorithm in which each iteration 
takes the form of the so-called donor-cell formula (which itself 
is a first-order advection scheme). 

MPDATA and donor-cell are explicit forward-in-time algo- 
rithms - they allow to predict t/r [n+I1 as a function of i^'" 1 where 
n and n + 1 denote two adjacent time levels. The donor-cell 
scheme may be written as [eq. 2 in@]: 



i+i]_,/>] V/pLw ,/>] rM 

d=0 



(2) 



(A [n] C 



where N is the number of dimensions, and F is the so-called 
flux function (0, eq. 3]: 



F(iff L , ijj R , C) = max(C, 0) • 4r L + min(C, 0) • ifr R 
C + \C\ C-\C\ 

The flux function takes the following form in C++: 

listing C.10 (C++) 

template class Tl, class T2, class T3> 
inline auto F ( 

const Tl spsi_l, const T2 &psi_r, const T3 SC 



(3) 



) 



return_macro ( 
( 

(C + abs (C) ) 
(C - abs (C) ) 

) / 2 



psi_ 
psi_ 



Equation [2] is split into the terms under the summation (effec- 
tively the 1 -dimensional donor-cell formula): 



listing C . 11 (C++) 



template<int d- 
inline auto donorcell ( 

const arr_t &psi, const arr_ 



t SC, 



const rng_t &i, 
return_macro ( 
F ( 

psi (pi<d> (i, 
psi (pi<d> (i+1, 
C (pi<d> (i+h, 



const rng_t & j 



j) ) , 
j)), 
j) ) 



F ( 



psi (pi<d> (i-1, j)), 
psi (pi<d> (i, j) ) , 
C(pi<d>(i-h, j)) 
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and the actual two-dimensional donor-cell formula: 



listing C.12 (C++) 

void donorcell_op ( 

const arrvec_t &psi, const int n, 

const arrvec_t &C, 

const rng_t &i, const rng_t &j 



) { 



psi[n + l] (i, j) = psi[n] (i, j) 

- donorcell<0> (psi [n] , C[0], 

- donorcell<l> (psi [n] , C[l], 



j) 
i) i 



Listings P.9-P1 1 and F.9-F. 13 contain the Python and Fortran 
counterparts to listings C.12-C.15. 

2.9. Donor-cell solver ( C++) 

As mentioned in the previous section, the donor-cell formula 
constitutes an advection scheme, hence we may use it to create 
a solver_donorcell implementation of the abstract solver class: 

listing C.13 (C++) 



template<class bcx_t, class bcy_t> 
struct solver_donorcell : solver<bcx_t , bcy_t> 
( 

solver_donorcell (int nx, int ny) : 

solver<bcx_t , bcy_t>(nx, ny, 1) 



void advop ( ) 
{ 

donorcell_op ( 

this->psi, this->n, this->C, 
this->i, this->j 

) ; 



The above definition is given as an example only. In the fol- 
lowing sections an MPDATA solver of the same structure is 
defined. 

Listings P. 12 and F.14 contain the Python and Fortran coun- 
terparts to listing C.16. 



For positive-definite if/, the A and B terms take the following 
fomS 



A aj ,= 



B\ d] 



(6) 



(7) 



UJ] 2 <%;]+< + + ( A[/j']+<. + ( A[«,;]+<„ 1 

If the denominator in equations [6] or [7] equals zero for a 
given i and j, the corresponding A^yj and By a are set to zero 
what may be conveniently represented with the where construct 
(available in all three considered languages): 

listing C.14 (C++) 



template<class nom_t, class den_t> 
inline auto mpdata_f rac ( 

const nom_t &nom, const den_t &den 
} return_macro ( 

where (den > 0, nom / den, 0) 

) 



The A term defined in equation[6]takes the following form: 



listing C . 15 (C++) 



template<int d> 
inline auto mpdata_A (const arr_t &psi, 

const rng_t &i, const rng_t Sj 
} return_macro ( 
mpdata_frac ( 

psi (pi<d> (i+1, j)) - psi (pi<d> (i, j ) ) , 
psi (pi<d> (i+1, j)) + psi (pi<d> (i, j ) ) 



The B term defined in equationQtakes the following form: 



listing C.16 (C++) 



template<int d 
inline auto mpdata_B (const arr_t 
const rng_t &i, const rng_t &j 
) return_macro ( 
mpdata_f rac ( 

psi (pi<d> (i+1, j+1) ) 
psi (pi<d> (i+1, j-1) ) 
psi (pi<d> (i+1, j+1) ) 
psi (pi<d> (i+1, j-1) ) 
) / 2 

) 



&psi, 



+ psi (pi<d> (i, 
psi (pi<d> (i, 
+ psi (pi<d> (i, 
+ psi (pi<d> (i, 



+ 1) ) 
-1) ) , 
+ 1) ) 
-1) ) 



2.10. MPDATA formula: (C++) 

MPDATA introduces corrective steps to the algorithm de- 
fined by equation [2] and [3] Each corrective step is a donor- 
cell step (eq. [2]) with the Courant number fields corresponding 
to the MPDATA antidiffusive velocities of the following form 
[eqs 13, 14in0]: 



C 



>[d] 



C 



Id] 



1 



c 



[d] 



M+< 

yffl 



- Z ^*S*«'0» 



(4) 



q=0,q*d 



where if/ and C represent values from the previous iteration and 
where: 



7*1 



- 1 (r^ 



[?] + gig] + 



Ml] + Ml] 



(5) 



Equation|5]takes the following form: 



template<int d> 
inline auto mpdata_C_bar ( 

const arr_t &C, 

const rng_t &i, 

const rng_t & j 
) return_macro ( 



listing C.17 (C++) 



( 



C (pi<d> (i+1, 
C (pi<d> (i+1, 

/ 4 



j+h) ) 
j-h) ) 



C (pi<d> (i, 
C (pi<d> (i, 



j + h) ) 
j h) ) 



Equation 3] take the following form: 



listing C.18 (C++) 

template<int d> 

inline auto mpdata_C_adf ( 

const arr_t &psi, 

const rng_t &i, const rng_t 

const arrvec_t &C 
) return_macro ( 

abs (C [d] (pi<d> (i+h, j) ) ) 



9 Since i// > 0, |A| < 1 and \B\ < 1. See Smolarkiewicz (TT|, Sec. 4.2] for 
description of adaptation of the formulse for advection of fields of variable sign 



(1 - abs (C [d] (pi<d> (i+h, j)))) 
mpdata_A<d> (psi, i, j) 
C [d] (pi<d> (i + h, j) ) 
mpdata_C_bar<d> (C [d-1] , i, j) 
mpdata_B<d> (psi, i, j) 



Listings P. 13-P. 17 and F. 15-F.21 contain the Python and For- 
tran counterparts to listing C.16-C.22. 

2.11. MPDATA solver (C++) 

An MPDATA solver may be now constructed by inheriting 
from solver class with the following definition in C++: 

listing C.19 (C++) 



template<int n_iters, class bcx_t, class bcy_t> 
struct solver_mpdata : solver<bcx_t, bcy_t> 
{ 

// member fields 
arrvec_t tmp [ 2 ] ; 
rng_t im, jm; 



solver_mpdata (int nx, int ny) : 
solver<bcx_t, bcy_t> (nx, ny, 1), 
im (this->i . first ( ) - 1, this->i . last ) , 
jm (this->j . first ( ) - 1, this-> j . last ( ) ) 

{ 

int n_tmp = n_iters > 2 ? 2 : 1; 
for (int n - 0; n < n_tmp; ++n) 
( 

tmp[n] . push_back (new arr_t ( 

this->C [ ] . domain ( ) [ ] , this->C [ ] . domain ( ) [ 1 ] ) 

) ; 

tmp [n] .push_back (new arr_t ( 

this->C [ 1 ] . domain ( ) [ ] , this->C [ 1 ] . domain ( ) [ 1 ] ) 



// method invoked by the solver 

void advop ( ) 

{ 

for (int step = 0; step < n_iters; ++step) 
( 

if (step == 0) 
donorcell_op ( 

this->psi, this->n, this->C, this->i, this->j 

) ; 
else 

{ 

this->cycle ( ) ; 
this->bcx . f ill_halos ( 

this->psi [this->n] , ext(this->j, this->hlo) 

) ; 

this->bcy . f ill_halos ( 

this->psi [this->n] , ext(this->i, this->hlo) 

) ; 

// choosing input/output for antidiff C 
const arrvec_t 

.iC_unco = (step == 1) 
? this->C 
: (step % 2) 

? tmp [1] // odd steps 
: tmp [ ] , // even steps 
.iC_corr = (step % 2) 

? tmp [0] // odd steps 
: tmp [ 1 ] ; // even steps 

// calculating the antidiffusive C 
C_corr[0] (im+h, this->j) = mpdata_C_adf <0> ( 
this->psi [this->n] , im, this->j, C_unco 

) ; 

this->bcy . f ill_halos (C_corr [ ] , ext (this->i, h) ) ; 

C_corr[l] (this->i, jm+h) = mpdata_C_adf <1> ( 
this->psi [this->n] , jm, this->i, C_unco 

) ; 

this->bcx . f ill_halos (C_corr [ 1 ] , ext (this-> j , h) ) ; 

// donor-cell step 



donorcell_op ( 

this->psi, this->n, C_corr, this->i, this~>j 



The array of sequences of temporary arrays tmp allocated in 
the constructor is used to store the antidiffusive velocities from 
the present and optionally previous timestep (if using more than 
two iterations). 

The advop() method contrails the MPDATA iterations within 
one timestep. The first (step = 0) iteration of MPDATA is an 
unmodified donor-cell step (compare listing C.15). Subsequent 
iterations begin with calculation of the antidiffusive Courant 
fields using formula |4] In order to calculate values spanning 
an (1-V2 ... 1+V2) range using a formula for C[;+i/ v .j only, the 
formula is evaluated using extended index ranges im and jm. 
In the second (step=l) iteration the uncorrected Courant field 
(C_unco) points to the original C field, and the antidiffusive 
Courant field is written into C_corr which points to tmp[l]. 
In the third (step=2) iteration C_unco points to tmp[l] while 
C_corr points to tmp[0]. In subsequent iterations tmp[0] and 
tmp[l] are alternately swapped. 

Listings P. 18 and F.22 contain the Python and Fortran coun- 
terparts to listing C.23. 

2.72. Usage example ( C++) 

The following listing provides an example of how the MP- 
DATA solver defined in section l2~TT1 mav be used together with 
the cyclic boundary conditions defined in section |2~71 In the ex- 
ample a Gaussian signal is advected in a 2D domain defined 
over a grid of 24x24 cells. The program first plots the ini- 
tial condition, then performs the integration for 75 timesteps 
with three different settings of the number of iterations used 
in MPDATA. The velocity field is constant in time and space 
(although it is not assumed in the presented implementations). 
The signal shape at the end of each simulation is plotted as well. 
Plotting is done with the help of the gnuplot-iostream librar>0 

The resultant plot is presented herein as Figure |2] The top 
panel depicts the initial condition. The three other panels show 
a snapshot of the field after 75 timesteps. The donor-cell so- 
lution is characterised by strongest numerical diffusion result- 
ing in significant drop in the signal amplitude. The signals ad- 
vected using MPDATA show smaller numerical diffusion with 
the solution obtained with more iterations preserving the sig- 
nal altitude more accurately. In all of the simulations the sig- 
nal maintains its positive definiteness. The domain periodicity 
is apparent in the plots as the maximum of the signal after 75 
timesteps is located near the domain walls. 

Listings P. 19 and F.23-F.24 contain the Python and Fortran 
counterparts to listing C.24 (with the set-up and plotting logic 
omitted). 



'"gnuplot-iostream is a header-only C++ library allowing gnuplot to be con- 
trolled from C++, see http://stahlke.org/dan/gnuplot-iostream/ 
Gnuplot is a portable command-line driven graphing utility, see 
http : //gnuplot . info/ 
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322 


int main ( ) 






323 


{ 








324 


int n [ ] - 


{24, 24}, nt - 75; 




325 


Gnuplot gp 






326 


gp « 


"set 


term pdf size 10cm, 30cm\n" 


327 


<< 


"set 


output ' figure .pdf \n" 




328 


<< 


"set 


multiplot layout 4,l\n" 




329 


<< 


"set 


border 4095\n" 




330 


<< 


"set 


xtics out\n" 




331 


<< 


"set 


ytics out\n" 




332 


<< 


"unset ztics\n" 




333 


<< 


"set 


xlabel ' X'\n" 




334 


<< 


"set 


ylabel 'Y'\n" 




335 


<< 


"set 


xrange [0:" << n[x]-l << 


"]\n" 


336 


<< 


"set 


yrange [0:" << n[y]-l << 


"]\n" 


337 


<< 


"set 


z range [-.666:l]\n" 




338 


<< 


"set 


cbrange [-.025:1.025] \n" 




339 


<< 


"set 


palette maxcolors 42 \n" 




340 


<< 


"set 


pm3d at b\n" ; 





listing C.20 (C++) 



# include "listings . hpp" 
#define GNUPLO T_ENABLE_BL I TZ 

^include <gnuplot-i ostream/gnuplot -iostream. h> 
enum { x, y } ; 



template <class T> 
void setup (T &solver, 



int n[2] 



blit z : : first Index i ; 

blit z : : secondlndex j ; 

solver . state ( ) - exp ( 

-sqr (i -n [x] 12 . ) / (2 *pow (n [x] /10 . , 
-sqr ( j-n [y] 12.) I (2*pow (n [y] /10 . , 

) ; 

solver . courant (x) = -.5; 
solver . courant (y) = -.25; 



2) } 
2) } 



std:: string binfmt; 
{ 

solver_donorcell<cyclic<x>, cyclic<y>> 

slv (n [x] , n [y] ) ; 
setup (slv, n) ; 

binfmt = gp . binfmt ( slv . state ( ) } ; 
gp << "set title 't=0'\n" 

<< "splot ' -' binary" << binfmt 

<< "with lines notitle\n" ; 
gp . sendBinary (slv . state ( } . copy ( ) } ; 
slv . solve (nt ) ; 

gp << "set title 'donorcell t=" <<nt <<" ' \n" 
<< "splot ' -' binary" << binfmt 
<< "with lines notitle\n"; 

gp . sendBinary (slv . state { } . copy ( ) } ; 



const int it = 2; 

solver_mpdat a<it , cyclic <x>, cyclic<y>> 

slv (n [x] , n [y ] } ; 
setup {slv, n) ; 
slv . solve (nt ) ; 

gp << "set title 'mpdata<" << it << "> " 
<< "t=" << nt << " ' \n" 
<< "splot ' -' binary" << binfmt 
<< "with lines notitle\n"; 

gp . sendBinary (slv . state ( } . copy ( ) ) ; 



const int it = 44; 

solver_mpdat a<it , cyclic <x>, cyclic<y>> 

slv (n [x] , n [y ] } ; 
setup (slv, n) ; 
slv . solve (nt } ; 

gp << "set title 'mpdata<" << it << "> " 
<< "t=" << nt << " ' \n" 
<< "splot ' -' binary" << binfmt 
<< "with lines notitle\n"; 

gp . sendBinary (slv . state ( } . copy ( ) ) ; 




donorcell t=75 




mpdata<2> t=75 




mpdata<44> t=75 




Figure 2: Plot generated by the program given in listing C.24. The top panel 
shows initial signal shape (at time t=0). The subsequent panels show snapshots 
of the advected field after 75 timesteps from three different simulations: donor- 
cell (or 1 MPDATA iteration), MPDATA with two iterations and MPDATA with 
44 iterations. The colour scale and the wire-frame surface correspond to signal 
amplitude. See section [2. 12 I f or discussion. 



n 



3. Performance evaluation 

The three introduced implementations of MPDATA were 
tested with the following set-ups employing free and open- 
source tools: 



100 



C++: 



• GCC g++ 4.8.C0 and Blitz++ 0.10 

• LLVM Clang 3.2 and Blitz 0.10 



Python: 



Fortran: 



CPython 2.7.3 and NumPy 1.7 

PyPy 1.9.0 with built-in NumPy implementation 

GCC gfortran4.8.cP 



The performance tests were run on a Debian and an Ubuntu 
GNU/Linux systems with the above-listed software obtained 
via binary packages from the distributions' package repositories 
(most recent package versions at the time of writing). The tests 
were performed on two 64-bit machines equipped with an AMD 
Phenom™ II X6 1055T (800 MHz) and an Intel® Core™ i5- 
2467M (1.6 GHz) processors. 

For both C++ and Fortran the GCC compilers were invoked 
with the -Ofast and the -march=native options. The Clang 
compiler was invoked with the -03, the -mllvm -vectorize, the 
-ffast-math and the -march=native options. The CPython in- 
terpreter was invoked with the -OO option. 

In addition to the standard Python implementation CPython, 
the Python code was tested with PyPy. PyPy is an alterna- 
tive implementation of Python featuring a just-in-time com- 
piler. PyPy includes an experimental partial reimplementation 
of NumPy that compiles NumPy expressions into native assem- 
bler. Thanks to employment of lazy evaluation of array expres- 
sions (cf. Sect. 12. U PyPy allows to eliminate the use of tem- 
porary matrices for storing intermediate results, and to perform 
multiple operations on the arrays within a single array index 
traversal 1 12 l. Consequently, PyPy allows to overcome the same 
performance-limiting factors as those addressed by Blitz ++, al- 
though the underlying mechanisms are different. In contrast 
to other solutions for improving per formance of NumPy-based 
codes such as Cythord 

numexpO or NumbcQ PyPy does not 
require any modifications to the code. Thus, PyPy may serve 
as a drop-in replacement for CPython ready to be used with 
previously-developed codes. 

The same set of tests was run with all four set-ups. Each 
test set consisted of 16 program runs. The test programs are 
analogous to the example code presented in section 12.121 The 



"GNU Compiler Collection packaged in the Debian's gcc- 
snapshot_20 130222-1 

12 Lazy evaluation available in PyPy 1.9 has been temporarily removed from 
PyPy during a refactoring of the code. It'll be reinstantiated in the codebase as 
soon as possible, but past PyPy 2.0 release 

13 see http : / /cython.org| 

14 see http: / /code . google . com/p/numexpr/ 

15 see http: / /numba.pydata. org/ 
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Figure 3: Memory consumption statistics for the test runs described in Sec- 
tion f5] plotted as a function of grid size. Peak resident set size (rss) values 
reported by the GNU time utility are normalised by the size of data that needs 
to be allocated in the program to store all declared grid-sized arrays. Asymp- 
totic values reached at the largest grid sizes are indicative of temporary storage 
requirements. 



tests were run with different grid sizes ranging from 64x64 
to 2048x2048. The Gaussian impulse was advected for nt = 
2 24 /(nx ■ ny) timesteps (2 24 chosen arbitrarily), in order to as- 
sure comparable timing accuracy for all grid sizes. Three MP- 
DATA iterations were used (i.e. two corrective steps). The ini- 
tial condition was loaded from a text file, and the final values 
were compared at the end of the test with values loaded from 
another text file assuring the same results were obtained with all 
four set-ups. The tests were run multiple times; program start- 
up, data loading, and output verification times were subtracted 
from the reported values (see caption of Figure |4]for details). 

Figure [3] presents a plot of the peak memory use0 (identi- 
cal for both considered CPUs) as a function of grid size. The 
plotted values are normalised by the nominal size of all data 
arrays used in the program (i.e. two (nx+2)x(ny+2) arrays 
representing the two time levels of i/r, a (nx+l)x(ny+2) ar- 
ray representing the C w component of the Courant number 
field, a (nx+2)x(ny+l) array representing the C [vl component, 
and two pairs of arrays of the size of C w and for stor- 
ing the antidiffusive velocities, all composed of 8-byte double- 
precision floating point numbers). Plotted statistics reveal a no- 
table memory footprint of the Python interpreter itself for both 
CPython and PyPy, losing its significance for domains larger 
than 1024x1024. The roughly asymptotic values reached in all 
four set-ups for grid sizes larger that 1024x1024 are indicative 
of the amount of temporary memory used for array manipu- 
lation. PyPy- and Blitz++-based set-ups consume notably less 
memory than Fortran and CPython. This confirms the effective- 



16 The resident set size (rss) as reported by GNU time (version 1.7-24) 
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Figure 4: Execution time statistics for the test runs described in Section[3]plot- 
ted as a function of grid size. Values of the total user mode CPU time reported 
by the GNU time utility are normalised by the grid size (nx ■ ny) and the number 
of timesteps nt = 2 24 /(«.r ■ ny). Before normalisation the time reported for an 
nt = run for a corresponding domain size is subtracted from the values. Both 
the nt = and nt = 2 24 /Qnx ■ ny runs are repeated three times and only the 
shortest time is taken into account. Results obtained with an Intel® Core 1 " i5 
1.6 GHz processor. 



ness of the just-in-time compilation (PyPy) and the expression- 
templates (Blitz++) techniques for elimination of temporary 
storage during array operations. 

The CPU time statistics presented in Figures |4] and [5]reveal 
minor differences between results obtained with the two differ- 
ent processors. Presented results lead to the following obser- 
vations (where by referring to language names, only the results 
obtained with the herein considered program codes, and soft- 
ware/hardware configurations are meant): 

• Fortran gives shortest execution times for any domain size; 

• C++ execution times are less than twice those of Fortran 
for grids larger than 256x256; 

• CPython requires from around 4 to almost 10 times more 
CPU time than Fortran depending on the grid size; 

• PyPy execution times are in most cases closer to C++ than 
to CPython. 

The support for OOP features in gfortran, the NumPy support 
in PyPy, and the relevant optimisation mechanisms in GCC 
are still in active development and hence the performance with 
some of the set-ups may likely change with newer versions of 
these packages. 

It is worth mentioning, that even though the three implemen- 
tations are equally structured, the three considered languages 
have some inherent differences influencing the execution times. 
Notably, while Fortran and Blitz++ offer runtime array-bounds 
and array-shape checks as options not intended for use in pro- 
duction binaries, NumPy performs them always. Additionally, 
the C++ and Fortran set-ups may, in principle, benefit from 
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Figure 5: Same as Fig.gjfor an AMD Phenom™ II 800 MHz processor. 



GCC's auto-vectorisation features which do not have yet coun- 
terparts in CPython or PyPy. Finally, Fortran uses different or- 
dering for storing array elements in memory, but since all tests 
were carried out using square grids, this should not have had 
any impact on the performance^ 

The authors do expect some performance gain could be ob- 
tained by introducing into the codes some "manual" optimisa- 
tions - code rearrangements aimed solely at the purpose of in- 
creasing performance. These were avoided intentionally as they 
degrade code readability, should in principle be handled by the 
compilers, and are generally advised to be avoided [e.g.|22] sec- 
tion 3.12]. 



4. Discussion on the tradeoffs of language choice 

One of the aims of this paper is to show the applicabil- 
ity of OOP features of the three programming languages (or 
language-library pairs) for scientific computing. The main fo- 
cus is to represent what can be referred to as blackboard ab- 
stractions [21] within the code. Presented benchmark tests, 
although quite simplistic, together with the experience gained 
from the development of codes in three different languages pro- 
vide a basis for discussion on the tradeoffs of programming lan- 
guage choice. The discussion concerns in principle the devel- 
opment of finite-difference solvers for partial differential equa- 
tions, but is likely applicable to the scientific software in gen- 
eral. A partly objective and partly subjective summary of pros 
and cons of C++, Python and Fortran is presented in the four 
following subsections. 



17 Both Blitz++ and NumPy support Fortran's column-major ordering as 
well, however this feature is still missing from PyPy's built-in NumPy imple- 
mentation as of PyPy 1 .9 
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4.1. OOP for blackboard abstractions 

It was shown in section [2] that C++1 1/Blitz++, 
Python/NumPy and Fortran 2008 provide comparable func- 
tionalities in terms of matching the blackboard abstractions 
within the program code. Taking into account solely the part 
of code representing particular formulae (e.g. listings C.21, 
P. 17, F.20 and equation [4} all three languages allow to match 
(or surpass) KTgX in its brevity of formula translation syntax. 
All three languages were shown to be capable of providing 
mechanisms to compactly represent such abstractions as: 

• loop-free array arithmetics; 

• definitions of functions returning array-valued expres- 
sions; 

• permutations of array indices allowing dimension- 
independent definitions of functions (see e.g. listings C.12 
and C. 1 3, P. 10 and P. 1 1 , F. 1 1 and F. 12); 

• fractional indexing of arrays corresponding to employ- 
ment of a staggered grid. 

Three issues specific to Fortran that resulted in employment of 
a more repetitive or cumbersome syntax than in C++ or Python 
were observed: 

• Fortran does not feature a mechanism allowing to reuse a 
single piece of code (algorithm) with different data types 
(compare e.g. listings C.6, P.5 and F.4) such as templates 
in C++ and the so-called duck typing in Python; 

• Fortran does not allow function calls to appear on the left 
hand side of assignment (see e.g. how the ptr pointers 
were used as a workaround in the cyclic_flll_halos method 
in listing F.8); 

• Fortran lacks support for arrays of arrays (cf. Sect. 12.21 . 
Interestingly, the limitation in extendability via inheritance was 
found to exist partially in NumPy as well (see footnote|7]l. The 
lack of a counterpart in Fortran to the C++ template mechanism 
was identified in II23I1 as one of the key deficiencies of Fortran 
when compared with C++ in context of applicability to object- 
oriented scientific programming. 

4.2. Performance 

The timing and memory usage statistics presented in figures 
[3]|5] reveal that no single language/library/compiler set-up cor- 
responded to both shortest execution time and smallest memory 
footprint. 

One may consider performance measures addressing not 
only the program efficiency but also the factors influencing the 
development and maintenance time/cost [of particular impor- 
tance in scientific computing, a. Taking into account such 
measures as code length or coding time, the Python environ- 
ment gains significantly. Presented Python code is shorter than 
the C++ and Fortran counterparts, and is simpler in terms of 
syntax and usage (see discussion below). 

Employment of the PyPy drop-in replacement for the stan- 
dard Python implementation brings Python's performance sig- 
nificantly closer to those of C++ and Fortran, in some cases 
making it the least memory consuming set-up. Python has 
already been the language of choice for scientific software 



projects having code clarity or ease of use as the first require- 
ment [see e.g.|25[[. PyPy's capability to improve performance of 
unmodified Python code may make Python a favourable choice 
even if high performance is important, especially if a combined 
measure of performance and development cost is to be consid- 
ered. 

4.3. Ease of use and abuse 

Using the number of lines of code or the number of distinct 
language keywords needed to implement the MPDATA-based 
solver presented in section [2] as measures of syntax brevity, 
Python clearly surpasses its rivals. Python was developed with 
emphasis on code readability and object-orientation. Arguably, 
taking it to the extreme - Python uses line indentation to define 
blocks of code and treats even single integers as objects. As a 
consequence Python is easy to learn and easy to teach. It is also 
much harder to abuse Python than C++ or Fortran (for instance 
with goto statements, employment of the preprocessor, or the 
implicit typing in Fortran). 

Python implementations do not expose to the user the compi- 
lation or linking processes. As a result, Python-written software 
is easier to deploy and share, especially if multiple architectures 
and operating systems are targeted. However, there exist tools 
such as CMakd 18 ! that allow to efficiently automate building, 
testing and packaging of C++ and Fortran programs. 

Python is definitely easiest to debug among the three lan- 
guages. Great debugging tools for C++ do exist, however the 
debugging and development is often hindered by indecipher- 
able compiler messages flooded with lengthy type names stem- 
ming from employment of templates. Support for the OOP fea- 
tures of Fortran among free and open source compilers, debug- 
gers and other programming aids remains immature. 

With both Fortran and Python, the memory footprint caused 
by employment of temporary objects in array arithmetics is de- 
pendant on compiler choice or the level of optimisations. In 
contrast, Blitz++ ensures temporary-array-free computations 
by design [26] avoiding unintentional performance loss. 

4.4. Added values 

The size of the programmers' community of a given lan- 
guage influences the availability of trained personnel, reusable 
software components and information resources. It also affects 
the maturity and quality of compilers and tools. Fortran is a 
domain-specific language while Python and C++ are general- 
purpose languages with disproportionately larger users' com- 
munities. The OOP features of Fortran have not gained wide 
popularity among users [270 Fortran is no longer routinely 
taught at the universities 112811 - in contrast to C++ and Python. 
An example of decreasing popularity of Fortran in academia is 
the discontinuation of Fortran printed editions of the "Numeri- 
cal Recipes" series of Press et al. 



18 CMake is a family of open-source, cross-platform tools automat- 
ing building, testing and packaging of C/C++/Fortran software, see 
http : //cmake . org/ 

19 An anecdotal yet significant example being the incomplete support for 
syntax-highlighting of modern Fortran in Vim and Emacs editors 
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Blitz++ is one of several packages that offer high- 
performance object-oriented array manipulation functionality 
with C++ (and is not necessarily optimal for every purpose 
j29ll ). In contrast, the NumPy package became a de facto stan- 
dard solution for Python. Consequently, numerous Python li- 
braries adopted NumPy but there are apparently very few C++ 
libraries offering Blitz++ support out of the box (the gnuplot- 
iostream used in listing C.24 being a much-appreciated coun- 
terexample). However, Blitz++ allows to interface with vir- 
tually any library (including Fortran libraries), by resorting to 
referencing the underlying memory with raw pointers. 

The availability and quality of libraries that offer object- 
oriented interfaces differs among the three considered lan- 
guages. The built-in standard libraries of Python and C++ are 
richer than those of Fortran and offer versatile data types, col- 
lections of algorithms and facilities for interaction with host op- 
erating system. In the authors' experience, the small popularity 
of OOP techniques among Fortran users is reflected in the li- 
brary designs (including the Fortran's built-in library routines). 
What makes correct use of external libraries more difficult with 
Fortran is the lack of standard exception handling mechanism, 
a feature long and much requested by the numerical community 
lf30L Foreword] . 

Finally, the three languages differ as well with regard to 
availability of mechanisms for leveraging shared-memory par- 
allelisation (e.g. with multi-core processors). GCC supports 
OpenMP with Fortran and C++. The CPython and PyPy im- 
plementations of Python do not offer any built-in solution for 
multi-threading. 



• helping to keep the programs maintainable and avoiding 
accumulation of the code deb0 that besets scientific soft- 
ware in such domains as climate modelling |36]. 

The performance evaluation revealed that: 

• the Fortran set-up offered shortest execution times, 

• it took the C++ set-up less than twice longer to compute 
than Fortran, 

• C++ and PyPy set-ups offered significantly smaller mem- 
ory consumption than Fortran and CPython for larger do- 
mains, 

• the PyPy set-up was roughly twice slower than C++ and 
up to twice faster than CPython. 

The three equally-structured implementations required ca. 200, 
300, and 500 lines of code in Python, C++ and Fortran, respec- 
tively. 

In addition to the source code presented within the text, a set 
of tests and build-/test-automation scripts allowing to reproduce 
the analysis and plots presented in section [3] are all available in 
the CPC Program Library and at the project repository^, and 
are released under the GNU GPL license IU8I1 . The authors 
encourage to use the presented codes for teaching and bench- 
marking purposes. 

The OOP design enhances the possibilities to reuse and 
extend the presented code. Development is underway of 
an object-oriented C++ library featuring concepts presented 
herein, supporting integration in one to three dimensions, han- 
dling systems of equations with source terms, providing mis- 
cellaneous options of MPDATA and several parallel processing 
approaches. 



5. Summary and outlook 
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Appendix P. Python code for sections l2l7l42.11l 
Periodic Boundaries (cf. Sect. \2.7\ 



listing P. 8 (Python) 



class cyclic{ob ject) : 

# ctor 

def init (self, d, i, hlo) : 

self .d = d 

self . lef t_halo = slice (i . start-hlo, i. start ) 
self . rght_edge = slice (i . stop -hlo, i.stop ) 
self . rght_halo = slice (i . stop, i.stop +hlo) 

self . lef t_edge = slice (i . start, i.start+hlo) 

# method invoked by the solver 
def f ill_halos { self , psi, j): 

psi [pi (self . d, self . left_halo, j)] = ( 
psi [pi (self . d, self . rght_edge, j)] 

) 

psi [pi ( self . d, self . rght_halo, j)] = ( 
psi [pi (self . d, self . left_edge, j)] 



Donor-cell formulae (cf. Sect. \2.8[ 



listing P. 9 (Python) 
def f (psi_l, psi_r, C) : 



return ( 

(C + abs (C) ) 
(C - abs (C) ) 
) / 2 



psi_l 
psi_r 



def donorcell(d, 
return ( 

f ( 

psi [pi (d, 
psi [pi (d, 
C [pi (d, 



. listing P 
psi, C, i, 



(Python) 



psi [pi (d, 
psi [pi (d, 
C [pi (d, 



l+one, 
i+hlf , 



-hlf, 



j)], 
j)]. 
j) 1 



j)]. 
j)], 
j) 1 



listing P. 11 (Python) 

def donorcell_op (psi, n, C, i, j) : 
psi [n + 1] [i, j] = (psi[n][i,j] 

■ donorcell(0, psi [n] , C[C], i, j) 
- donorcell(l, psi [n] , C[l], j, i) 

) 



Donor-cell solver (cf. Sect. \2.9\> 

listing P. 12 (Python) 



class solver_donorcell( solver) 

def init (self, bcx, bey, 

solver. init (self, bcx, 

def advop(self) : 
donorcell_op ( 

self. psi, self.n, 
self.C, self.i, self.j 

) 



nx, ny) : 
bey, nx, 



ny, 1) 



MPDATA formula: (cf Sect. \2lW> 

listing P. 13 (Python) 
den) 



def mpdata_f rac (nom, 

return numpy . where (den > , nom/ den, ) 



listing P. 14 (Python) 

def mpdata_A (d, psi, i, j) : 
return mpdata_f rac ( 

psi [pi (d, i+one, j)] - psi [pi (d, i, 
psi [pi (d, i+one, j)] + psi [pi (d, i, 

) 



j): 
j): 



listing P. 15 (Python) 

j) = 



def mpdata_B (d, psi 

return mpdata_f rac ( 

psi [pi (d, i+one, j+one) ] + psi [pi (d, i, j+one) ] 
psi [pi (d, i + one, j-one) ] - psi [pi (d, i, j-one) ] , 
psi [pi (d, i+one, j+one)] + psi [pi (d, i, j+one)] 
psi [pi (d, i+one, j-one)] + psi [pi (d, i, j-one)] 

) / 2 



listing P. 16 (Python) 



def mpdata_C_bar (d, C, i, j) 
return ( 

C [pi (d, i+one, j+hlf ) ] 
C [pi (d, i+one, j-hlf ) ] 

) / 4 



C[pi(d, i, 
C[pi(d, i, 



j+hlf) ] 
j-hlf) ] 



listing P. 17 (Python) 

def mpdata_C_adf {d, psi, i, j, C) : 
return ( 

abs (C [d] [pi (d, i + hlf, j) ] ) 

* {1 - abs (C [d] [pi (d, i+hlf, j)])) 

* mpdata_A(d, psi, i, j) 
- c[d] [pi(d, i+hlf, j) ] 

* mpdata_C_bar (d, C[d-1], i, j) 

* mpdat a_B (d, psi, i , j ) 

) 



An MPDATA solver (cf. Sect. [377]) 



listing P. 18 (Python) 

class solver_mpdata(solver) : 

def init (self, n_iters, bcx, bey, nx, ny) : 

solver . init (self, bcx, bey, nx, ny, 1) 

self.im = slice (self. i.start-1, self. i. stop) 
self.jm = slice(self.j. start -1, self. j. stop) 



self . n_iters 



n_iters 



self .tmp = [ ( 

numpy . empty (self.C [0] .shape, real_t ) , 
numpy . empty (self . C [ 1 ] .shape, real_t) 

) ] 

if n_iters > 2 : 

self . tmp . append ( ( 

numpy . empty (self.C [0] .shape, real_t ) , 
numpy . empty ( self . C [ 1 ] . shape, real_t ) 

) ) 

def advop(self) : 

for step in range ( self . n_iters ) : 
if step == : 
donorcell_op ( 

self. psi, self.n, self.C, self.i, self.j 

) 

else: 

self . cycle ( } 

self . bcx . f ill_halos ( 

self. psi [self.n] , ext (self.j, self. hlo) 

) 

self .bey . f ill_halos ( 

self. psi [self.n] , ext (self.i, self. hlo) 

) 

if step == 1 : 

C_unco, C_corr = self . C, self . tmp [ ] 
elif step % 2 : 

C_unco, C_corr = self. tmp [ 1 ] , self . tmp [ ] 
else : 

C_unco, C_corr = self . tmp [ ] , self . tmp [ 1 ] 

C_corr [0] [self . im+hlf , self.j] = mpdata_C_adf ( 

0, self. psi [self.n], self.im, self.j, C_unco 

) 

self .bey . f ill_halos (C_corr [ ] , ext (self . i, hlf ) ) 

C_corr [1 ] [self . i, self.jm+hlf] = mpdata_C_adf ( 

1, self. psi [self. n], self.jm, self.i, C_unco 

) 

self .bcx . f ill_halos (C_corr [ 1 ] , ext (self.j, hlf)} 

donorcell_op { 

self. psi, self.n, C_corr, self.i, self.j 



16 



Usage example (cf. Sect. \2.12\ 

_^ t _ listing P .19 (Python) 

226 slv = solver_mpdata (it, cyclic, cyclic, nx, ny) 
slv . state ()[: ] = read_f ile ( f name, nx, ny) 
slv . courant { 0) [: ] = Cx 



slv . courant { 1 ) [ : ] = Cy 
slv . solve (nt ) 



Appendix F. Fortran code for sections l2Jl42.11l 



Periodic boundaries (cf. Sect. \2.7\ 

. listing fTs[ 



(Fortran) 



module cyclic_m 
use bcd_m 
use pi_m 
implicit none 

type, extends (bcd_t) 



cyclic_t 



integer : : d 
integer 
integer 
contains 

procedure 
procedure 
end type 

contains 



subroutine cyclic, 
class {cyclic_t ) 
integer : : d, n, 

this%d = d 

this%lef t_halo = 

this%rght_halo = 

this%lef t_edge = 

this%rght_edge = 
end subroutine 



left_halo (2) , rght_halo (2) 
le f t_edge ( 2 ) , rght_edge ( 2 ) 



init => cyclic_init 

f ill_halos => cyclic_f ill_halos 



init (this, 
: : this 
hlo 



d, n, hlo) 



-hlo, -1 /) 
n, n-1+hlo /) 
0, hlo-1 /) 
n-hlo, n-1 /) 



subroutine cyclic_f ill_halos (this, a, 
class { cyclic_t ) : : this 
real ( real_t ) , pointer : : ptr ( : , : ) 
real ( real_t ) , allocatable :: a(:,:} 
integer : : j (2 ) 

ptr => pi (this%d, a, this%left_halo, 
ptr = pi (this%d, a, this%rght_edge, 
ptr => pi (this%d, a, this%rght_halo, 
ptr = pi (this%d, a, this%left_edge, 
end subroutine 
end module 



Donor-cell formulae ( cf. Sect, \2. 8t 

. listing F . y i 



(Fortran) 



module donorcell_ 
use real_m 
use arakawa_c_n 
use pi_m 
use arrvec_m 
implicit none 
contains 



listing F.10 

elemental function F (psi_l, 
real(real_t) : : return 
real ( real_t ) , intent (in) 
return = ( 

(C + abs (C) ) * psi_l + 
{C - abs (C) ) * psi_r 
) / 2 
end function 



(Fortran) 
psi_r, C) 



result (return) 

psi_l, psi_r, C 



listing F.ll (Fortran) 

function donorcell {d, psi, C, i, j ) result (return) 
integer : : d 

integer, intent (in) :: i{2), j(2) 

real ( real_t ) : : return ( span (d, i, j), span (d, j, i)} 
real (real_t) , allocatable, intent(in) : : psi (:,:}, C 
return = ( & 

F ( S 
pi (d, psi, i, j ) , & 
pi (d, psi, i + 1, j ) , & 
pi(d, C, i + h, j) & 



) - 
F ( 



pifd, 
pi(d, 
pi (d, 



psi 
psi, i, 
C, i-h 



end function 



listing F.12 (Fortran) 

subroutine donorcell_op (psi, n, C, i, j) 
class (arrvec_t ) , allocatable :: psi 
class (arrvec_t ) , pointer : : C 



integer, intent { in ) 
integer, intent ( in ) 



i(2), j(2) 



real(real_t) , pointer :: ptr ( : , : ) 

ptr => pi(0, psi%at (n+1 ) %p%a, i, j) 
ptr = pi(0, psi%at (n) %p%a, i, j) 

- donorcell(0, psi%at (n) %p%a, C%at(0)%p%a, 

- donorcell(l, psi%at (n) %p%a, C%at{l)%p%a, 
end subroutine 



j) 
i) 



351 end module 



listing F.13 (Fortran) 



Donor-cell solver (cf. Sect. 12.91 ) 



listing F.14 (Fortran) 



module solver_donorcell_m 
use donorcell_m 
use solver_m 
implicit none 

type, extends (solver_t) 
contains 



donorcell_t 



ctor => donorcell_ctor 
advop => donorcell_advop 



procedure 
procedure 
end type 

contains 



subroutine donorcell_ctor (this, bcx, bey, 
class (donorcell_t ) : : this 
class(bcd_t) , intent(in) , target 
integer, intent(in) : : nx, ny 
call solver_ctor (this, bcx, bey, nx, ny, 

end subroutine 

subroutine donorcell_advop (this) 

class (donorcell_t ) , target : : this 
class (arrvec_t ) , pointer : : C 

C => this%C 

call donorcell_op ( 

this %p si, this%n, C, this%i, this%j 

) 

end subroutine 
end module 



ny) 



bcx, bey 



MPDATA formulae (cf. Sect. IZTOI ) 

listing F .lb (Fortran) 



module mpdata_m 
use arrvec_m 
use arakawa_c_: 
use pi_m 
implicit none 
contains 



listing F.16 (Fortran) 



function mpdata_frac (nom, den) result (return) 



real(real_t) , intent(in) : : nom{ 
real(real_t) : : return(size (nom, 



, : ) , den ( : , : ) 
1 ) , size (nom. 



2) ) 



where (den 
return = 

elsewhere 
return = 

end where 
end function 



> 0) 

nom / den 



listing F.17 (Fortran) 



function mpdat a_A (d, psi , i , j ) result (return) 
integer : : d 

real ( real_t ) , allocatable, intent (in) :: psi ( : , ; ) 
integer, intent (in) :: i(2), j{2) 
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400 
401 
402 
403 
404 
405 



real(real_t) : : return(span (d, i, 

return = mpdata_f rac ( 

pi (d, psi, i + 1 , j) - pi (d, psi, 
pi (d, psi, i+1, j) + pi (d, psi, 

) 

end function 



j) , spanfd, j, i) } 
& 

i# j)/ & 
i, j) & 



406 
407 
408 
409 
410 
411 
412 
413 
414 
415 
416 
417 



listing F. 

function mpdata_B(d, psi, 
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(Fortran) _ 
j) result 



integer : : d 

real (real_t) , allocatable, intent(in) 
integer, intent (in) :: i(2), j ( 2 ) 



real (real_t) 
return 

pifd, 
- pi(d, 

pifd, 
+ pi (d, 
) / 2 
end function 



return(span (d, 

mpdata_f rac ( 



j), 



psi, 
psi, 
psi, 
psi, 



i + 1, 
i + 1, 
i + 1, 
i + 1, 



j+D 
j-D 
j+D 
j-D 



pi{d, 
pi(d, 
pi{d, 
pi(d, 



psi, 
psi, 
psi, 
psi, 



(return) 

: : psi ( : 
span (d, 

j+D 

j-D. 

j+D 



j, i)) 



418 
419 
420 
421 
422 
423 
424 
425 
42d 
427 
428 



listing F.19 

function mpdata_C_bar (d, C, 



(Fortran) 

i, j) result 



(return) 
integer : : d 

real (real_t) , allocatable, intent(in) : : C ( : , : } 
integer, intent (in) :: i(2), j(2) 
real (real_t) : : return(span (d, i, j } , span (d 



return = { 

pi (d, C, i+1, 
pifd, C, i+1, 
) / 4 
end function 



j+h) 
j-h) 



pi(d, 
pi(d. 



j+h) 
j-h) 



j, i)) 
& 



429 
430 
431 
432 
433 
434 
435 
436 
437 
438 
439 
440 
441 
442 



listing F.20 (Fortran) 

function mpdata_C_adf (d, psi, i, j , C) result (return) 
integer : : d 

integer, intent (in) :: i(2), j (2) 

real (real_t) : : return(span (d, i, j ) , span (d, 
real ( real_t ) , allocatable, intent (in) : : psi ( : 
class { arrvec_t ) , pointer : : C 
return = 

abs(pi{d, C%at(d)%p%a, i+h, j)) 

* (1 - abs(pi(d, C%at(d)%p%a, i+h, j)}) 

* mpdata_A {d, psi , i , j ) 
- pi(d, C%at(d)%p%a, i+h, j) 

* mpdata_C_bar (d, C%at {d- 1 ) %p%a, i, j) 

* mpdata_B (d, psi, i, j } 
end function 



j, i)) 



listing F.21 (Fortran) 



443 end module 



An MPDATA solver (cf. Sect. [ZTTt 

listing F.22 (Fortran) 



444 
445 
446 
447 
448 
449 
450 
451 
452 
453 
454 
455 
456 
457 
458 
459 
460 
461 
462 
463 
464 
465 
466 
467 
468 
469 
470 
471 
472 



module solver_mpdata_m 
use solver_m 
use mpdata_m 
use donorcell_m 
use halo_m 
implicit none 

type, extends (solver_t) : : mpdata_t 
integer : : n_iters, n_tmp 
integer : : im { 2 ) , jm { 2 ) 
class { arrvec_t ) , pointer : : tmp ( : ) 
contains 

procedure : : ctor => mpdata_ctor 
procedure : : advop => mpdata_advop 
end type 

contains 



subroutine mpdata_ctor (this 
class {mpdata_t ) : : this 
class(bcd_t) , target : : bcx, bey 
integer, intent (in) : : n_iters, nx 
integer : : c 



iters, bcx, bey, 



li y) 



ny 



call solver_ctor (this, bcx, bey, nx, ny, 1) 

this%n_iters = n_iters 
this%n_trrtp = min (n_iters - 1, 2) 

if (n_iters > ) allocate (this % tmp ( : this%n_tmp) ) 



associate (i => this%i, j => this%j, hlo => this%hlo 
do c=0, this%n_tmp - 1 
call this%tmp ( c) Ictor { 2 ) 

call this%tmp (c) %init (0, ext{i, h) , ext ( j , hlo)) 
call this%tmp { c) %init { 1 , ext { i , hlo ) , ext { j , h) ) 
end do 

this%im = (/ i(l) - 1, i(2) /) 
this%jm = (/ j (1) - 1, j (2) /) 
end associate 
end subroutine 

subroutine mpdata_advop (this ) 

class (mpdata_t ) , target : : this 
integer : : step 

associate (i => this%i, j => this%j, im => this%im, & 
jm => this%jm, psi => this%psi, n => this%n, & 
hlo => this%hlo, bcx => this%bcx, bey => this%bcy& 

) 

do step=0, this%n_iters-l 
if (step == 0) then 
block 

class { arrvec_t ) , pointer : : C 

C => this%C 

call donorcell_op (psi, n, C, i, j) 
end block 
else 

call this%cycle{) 

call bcx%f ill_halos { s 
psi%at ( n ) %p%a, ext(j, hlo) & 

) 

call bcy%f ill_halos { s 
psi%at ( n ) %p%a, ext (i, hlo) & 

) 

block 

class { arrvec_t ) , pointer : : C_corr, C_unco 
real (real_t) , pointer : : ptr ( : , : ) 

.' chosing input/output for antidlff. C 
if (step == 1) then 

C_unco => this%C 

C_corr => this%tmp ( ) 
else if (mod (step, 2 ) == 1 ) then 

C_unco => this%tmp { 1 ) odd step 

C_corr => this%tmp ( ) .' even step 
else 

C_unco => this%tmp{0) odd step 
C_corr => this%tmp ( 1 ) .' even step 
end if 

.' calculating the antidiffuslve velo 
ptr => pi(0, C_corr%at ( ) %p%a, im+h, j) 
ptr = mpdata_C_adf { & 
0, psi%at ( n ) %p%a, im, j, C_unco & 

) 

call bcy%f ill_halos { & 
C_corr%at (0) %p%a, ext (i, h) & 

) 



ptr => pi (0, C_corr%at { 1 ) %p%a, i, 
ptr = mpdata_C_adf { 

1, psi%at ( n ) %p%a, jm, i, C_unco 

) 

call bcx%f ill_halos { 

C_corr%at { 1 ) %p%a, ext { j , h) 

) 

.' donor-cell step 

call donorcell_op (psi, n, C_corr, i, 
end block 
end if 
end do 
end associate 
end subroutine 
end module 



jm+h) 



Usage example (cf. Sect. \2.12\ 

s £. - listing l r . 23 (Fortran) 

type (mpdata_t ) :: slv 
type { cyclic_t ) , target : : bcx, bey 
integer : : nx, ny, nt, it 
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real(real_t) Cx, Cy 

real (real_t) , pointer :: ptr ( : , : ) 



listing F.24 (Fortran) 

call slv%ctor(it, box, bey, nx, ny) 

ptr => slv%state() 

call read_f ile {fname, ptr) 



ptr => slv%courant { ) 
ptr = Cx 

ptr => slv%courant { 1 ) 
ptr = Cy 

call slv%solve {nt ) 
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